Online-Academy
Look, Read, Understand, Apply

Data Mining And Data Warehousing

Association Analysis

Association Analysis

Association analysis is the process of finding relation or association between two or more objects or between two or more attributes of objects. Association analysis has a very good application in the business; for example, in a department store it is always interesting to know which two or more products are purchased at the same time by most of the customers; In marketing this is known as Market basket Analysis. Customers use baskets while shopping; they put different products in the basket; What are the items most of the customers put in the basket while purchasing is interesting information for the store also. The store can optimize its store accordingly by placing the related products in close proximity such that customers don't have to walk in the store to find products of their need; the store can manage its inventory according to the sales of related items.
Several algorithms are proposed for association analysis; two most important are:
  • Aprior Algorithm
  • FP Tree

Aprior Algorithm

Aprior Algorithm is most suitable for market basket analysis. It takes transactional database as input; and produces interesting association between the products, objects. Minimum Support and Minimum Confidence for the association between objects or patterns have to be specified in this algorithm.
Minimum Confidence (A --> B) =
#_tuples_containing_A_and_B
   #_tuples_containing_A
    Minimum Support (A --> B) =
#_tuples_containing_A_and_B
   Total_#_tuples

Apriori Algorithm:

  1. Consider the dataset, D, of transactions, specify required minimum support of each item or combination of items
  2. Arrange the objects in ascending order (A, B, C) or (I1, I2, I3)
  3. Get a set of single items and name it as L0
  4. Find the frequency (count) of each item in the dataset, D
  5. Find support for each item
  6. Find the items which satisfy the minimum support
  7. Make a new set of those items which satisfy minimum support as L1
  8. Take cartesian product of set L1 to get set of two items,L2
  9. Find frequency and support of each item in the set L2
  10. Find the two items which satisfy minimum support
  11. Make set of those two items which satisfy minimum support, L3
  12. In this way make set of different number of items, find frequency and calculate support, check if the items satisfy minimum support or not. Those which satisfy minimum support and interesting pattern and considered for further pattern generation, those which don't satisfy minimum support are discarded from further consideration.

Example: Finding Interesting Association between products

Minimum Support = 25%. Total Number of records = 10
Transaction IDItems ID
T1A, B, C, D
T2A, C
T3C, A
T4A, C, D
T5B, D
T6C, B, D
T7A, D
T8B, D
T9B, C
T10A, B, C

Single Item Set, L0

Items
A
B
C
D

Single item with their frequency or count

ItemsFrequency
A6
B6
C7
D6
All of the items support minimum support of 20%, so, single items are interesting. Now get set of two items, L1, by taking cartesian product of L0 with L0.
Items
A, B
A, C
A, D
B, C
B, D
C, D
Get frequency of two items shown in the table L2
ItemsFrequencySupport
A, B22/10=20%
A, C55/10=50%
A, D33/10=30%
B, C33/10=30%
B, D44/10=40%
C, D33/10=30%
Here, interesting two items are: (A, C), (A, D), (B,C), (B,D), and (C,D) as all of these have support greater than minimum support of 25%.
To find interesting three items, take cartesian product of L1 with L1 and repeat above steps

Association Analysis

Association Analysis

Association analysis is a descriptive data mining technique used to discover interesting relationships (associations) or patterns among items in large datasets. It's most commonly used in market basket analysis to find products that are frequently bought together.
Term Meaning
ItemsetA collection of one or more items (e.g., {Chips, Coca_cola})
SupportHow often an itemset appears in the dataset
ConfidenceHow often the rule is true (e.g., if X is bought, how often is Y too?)
LiftHow much more likely Y is bought when X is bought, compared to random chance
Rule Example:

Rule: {Chips, Coca_cola} --> {Coca_cola}

  • Support: % of transactions that include Milk, Bread, and Butter
  • Confidence: % of transactions with Milk & Bread that also include Butter
  • Lift: Confidence divided by the expected confidence (independence)